Live freelance tracking. Raw descriptions turned into structured data. Find your next tech project without the noise.
upwork.com π‘ 2026-06-01
πΉ Public Web Data Intake System Implementation
π€ Client: πΊπΈ United States Member since 2026-03-25
π° Price: ****
π© Problem: Need for maintainable, production-ready scrapers to populate an existing research pipeline from diverse public web sources.
π¦ Existing: Core repository, database schema, stub source adapters, and ingestion flow (Python + Supabase).
Specifications:
[Target]: Public directories, vendor pages, documentation, blogs, news, public databases, APIs, RSS, paginated search pages
[Method]: Incremental ingestion, deduplication via content hashing, pagination handling, rate limiting, retries
[Stack]: Python, Supabase/Postgres, requests, httpx, BeautifulSoup, trafilatura, scrapy, playwright
[Format]: Structured records containing source URL, title, raw text, metadata, timestamps, and content hashes
[Security]: Lawful collection of public data; no authentication bypass, paywalls, or CAPTCHA solving
Workflow:
1. Review existing repository and source-adapter interface
2. Implement production-quality source collector
3. Integrate data storage into Supabase
4. Validate deduplication on repeat execution